home *** CD-ROM | disk | FTP | other *** search
Text File | 1991-06-01 | 32.8 KB | 1,467 lines | [TEXT/MPS ] |
- .TH GAWK 1 "August 24 1989" "Free Software Foundation"
- .SH NAME
- gawk \- pattern scanning and processing language
- .SH SYNOPSIS
- .B gawk
- .ig
- [
- .B \-d
- ] [
- .B \-D
- ]
- ..
- [
- .B \-a
- ] [
- .B \-e
- ] [
- .B \-c
- ] [
- .B \-C
- ] [
- .B \-V
- ] [
- .BI \-F\^ fs
- ] [
- .B \-v
- .IR var = val
- ]
- .B \-f
- .I program-file
- [
- .B \-\^\-
- ] file .\^.\^.
- .br
- .B gawk
- .ig
- [
- .B \-d
- ] [
- .B \-D
- ]
- ..
- [
- .B \-a
- ] [
- .B \-e
- ] [
- .B \-c
- ] [
- .B \-C
- ] [
- .B \-V
- ] [
- .BI \-F\^ fs
- ] [
- .B \-v
- .IR var = val
- ] [
- .B \-\^\-
- ]
- .I program-text
- file .\^.\^.
- .SH DESCRIPTION
- .I Gawk
- is the GNU Project's implementation of the AWK programming language.
- It conforms to the definition and description of the language in
- .IR "The AWK Programming Language" ,
- by Aho, Kernighan, and Weinberger,
- with the additional features defined in the System V Release 4 version
- of \s-1UNIX\s+1
- .IR awk ,
- and some GNU-specific extensions.
- .PP
- The command line consists of options to
- .I gawk
- itself, the AWK program text (if not supplied via the
- .B \-f
- option), and values to be made
- available in the
- .B ARGC
- and
- .B ARGV
- pre-defined AWK variables.
- .PP
- .I Gawk
- accepts the following options, which should be available on any implementation
- of the AWK language.
- .TP
- .BI \-F fs
- Use
- .I fs
- for the input field separator (the value of the
- .B FS
- predefined
- variable).
- .TP
- \fB\-v\fI var\fR\^=\^\fIval\fR
- Assign the value
- .IR val ,
- to the variable
- .IR var ,
- before execution of the program begins.
- Such variable values are available to the
- .B BEGIN
- block of an AWK program.
- .TP
- .BI \-f " program-file"
- Read the AWK program source from the file
- .IR program-file ,
- instead of from the first command line argument.
- Multiple
- .B \-f
- options may be used.
- .TP
- .B \-\^\-
- Signal the end of options. This is useful to allow further arguments to the
- AWK program itself to start with a ``\-''.
- This is mainly for consistency with the argument parsing convention used
- by most other System V programs.
- .PP
- The following options are specific to the GNU implementation.
- .TP
- .B \-a
- Use AWK style regular expressions as described in the book.
- This is the current default, but may not be when the POSIX P1003.2
- standard is finalized.
- It is orthogonal to
- .BR \-c .
- .TP
- .B \-e
- Use
- .IR egrep (1)
- style regular expressions as described in POSIX standard.
- This may become the default when the POSIX P1003.2
- standard is finalized.
- It is orthogonal to
- .BR \-c .
- .TP
- .B \-c
- Run in
- .I compatibility
- mode. In compatibility mode,
- .I gawk
- behaves identically to \s-1UNIX\s+1
- .IR awk ;
- none of the GNU-specific extensions are recognized.
- .TP
- .B \-C
- Print the short version of the GNU copyright information message on
- the error output.
- This option may disappear in a future version of
- .IR gawk .
- .TP
- .B \-V
- Print version information for this particular copy of
- .I gawk
- on the error output.
- This is useful mainly for knowing if the current copy of
- .I gawk
- on your system
- is up to date with respect to whatever the Free Software Foundation
- is distributing.
- This option may disappear in a future version of
- .IR gawk .
- .PP
- Any other options are flagged as illegal, but are otherwise ignored.
- .PP
- An AWK program consists of a sequence of pattern-action statements
- and optional function definitions.
- .RS
- .PP
- \fIpattern\fB { \fIaction statements\fB }\fR
- .br
- \fBfunction \fIname\fB(\fIparameter list\fB) { \fIstatements\fB }\fR
- .RE
- .PP
- .I Gawk
- first reads the program source from the
- .IR program-file (s)
- if specified, or from the first non-option argument on the command line.
- The
- .B \-f
- option may be used multiple times on the command line.
- .I Gawk
- will read the program text as if all the
- .IR program-file s
- had been concatenated together. This is useful for building libraries
- of AWK functions, without having to include them in each new AWK
- program that uses them. To use a library function in a file from a
- program typed in on the command line, specify
- .B /dev/tty
- as one of the
- .IR program-file s,
- type your program, and end it with a
- .B ^D
- (control-d).
- .PP
- The environment variable
- .B AWKPATH
- specifies a search path to use when finding source files named with
- the
- .B \-f
- option. If this variable does not exist, the default path is
- \fB".:/usr/lib/awk:/usr/local/lib/awk"\fR.
- If a file name given to the
- .B \-f
- option contains a ``/'' character, no path search is performed.
- .PP
- .I Gawk
- compiles the program into an internal form,
- executes the code in the
- .B BEGIN
- block(s) (if any),
- and then proceeds to read
- each file named in the
- .B ARGV
- array.
- If there are no files named on the command line,
- .I gawk
- reads the standard input.
- .PP
- If a ``file'' named on the command line has the form
- .IB var = val
- it is treated as a variable assignment. The variable
- .I var
- will be assigned the value
- .IR val .
- This is most useful for dynamically assigning values to the variables
- AWK uses to control how input is broken into fields and records. It
- is also useful for controlling state if multiple passes are needed over
- a single data file.
- .PP
- For each line in the input,
- .I gawk
- tests to see if it matches any
- .I pattern
- in the AWK program.
- For each pattern that the line matches, the associated
- .I action
- is executed.
- .SH VARIABLES AND FIELDS
- AWK variables are dynamic; they come into existence when they are
- first used. Their values are either floating-point numbers or strings,
- depending upon how they are used. AWK also has one dimension
- arrays; multiply dimensioned arrays may be simulated.
- There are several pre-defined variables that AWK sets as a program
- runs; these will be described as needed and summarized below.
- .SS Fields
- .PP
- As each input line is read,
- .I gawk
- splits the line into
- .IR fields ,
- using the value of the
- .B FS
- variable as the field separator.
- If
- .B FS
- is a single character, fields are separated by that character.
- Otherwise,
- .B FS
- is expected to be a full regular expression.
- In the special case that
- .B FS
- is a single blank, fields are separated
- by runs of blanks and/or tabs.
- Note that the value of
- .B IGNORECASE
- (see below) will also affect how fields are split when
- .B FS
- is a regular expression.
- .PP
- Each field in the input line may be referenced by its position,
- .BR $1 ,
- .BR $2 ,
- and so on.
- .B $0
- is the whole line. The value of a field may be assigned to as well.
- Fields need not be referenced by constants:
- .RS
- .PP
- .ft B
- n = 5
- .br
- print $n
- .ft R
- .RE
- .PP
- prints the fifth field in the input line.
- The variable
- .B NF
- is set to the total number of fields in the input line.
- .PP
- References to non-existent fields (i.e. fields after
- .BR $NF ),
- produce the null-string. However, assigning to a non-existent field
- (e.g.,
- .BR "$(NF+2) = 5" )
- will increase the value of
- .BR NF ,
- create any intervening fields with the null string as their value, and
- cause the value of
- .B $0
- to be recomputed, with the fields being separated by the value of
- .BR OFS .
- .SS Built-in Variables
- .PP
- AWK's built-in variables are:
- .PP
- .RS
- .TP \l'\fBIGNORECASE\fR'
- .B ARGC
- the number of command line arguments (does not include options to
- .IR gawk ,
- or the program source).
- .TP \l'\fBIGNORECASE\fR'
- .B ARGV
- array of command line arguments. The array is indexed from
- 0 to
- .B ARGC
- \- 1.
- Dynamically changing the contents of
- .B ARGV
- can control the files used for data.
- .TP \l'\fBIGNORECASE\fR'
- .B ENVIRON
- An array containing the values of the current environment.
- The array is indexed by the environment variables, each element being
- the value of that variable (e.g., \fBENVIRON["HOME"]\fP might be
- .BR /u/arnold ).
- Changing this array does not affect the environment seen by programs which
- .I gawk
- spawns via redirection or the
- .B system
- function.
- (This may change in a future version of
- .IR gawk .)
- .TP \l'\fBIGNORECASE\fR'
- .B FILENAME
- the name of the current input file.
- If no files are specified on the command line, the value of
- .B FILENAME
- is ``\-''.
- .TP \l'\fBIGNORECASE\fR'
- .B FNR
- the input record number in the current input file.
- .TP \l'\fBIGNORECASE\fR'
- .B FS
- the input field separator, a blank by default.
- .TP \l'\fBIGNORECASE\fR'
- .B IGNORECASE
- Controls the case-sensitivity of all regular expression operations. If
- .B IGNORECASE
- has a non-zero value, then pattern matching in rules,
- field splitting with
- .BR FS ,
- regular expression
- matching with
- .B ~
- and
- .BR !~ ,
- and the
- .BR gsub() ,
- .BR index() ,
- .BR match() ,
- .BR split() ,
- and
- .B sub()
- pre-defined functions will all ignore case when doing regular expression
- operations. Thus, if
- .B IGNORECASE
- is not equal to zero,
- .B /aB/
- matches all of the strings \fB"ab"\fP, \fB"aB"\fP, \fB"Ab"\fP,
- and \fB"AB"\fP.
- As with all AWK variables, the initial value of
- .B IGNORECASE
- is zero, so all regular expression operations are normally case-sensitive.
- .TP \l'\fBIGNORECASE\fR'
- .B NF
- the number of fields in the current input record.
- .TP \l'\fBIGNORECASE\fR'
- .B NR
- the total number of input records seen so far.
- .TP \l'\fBIGNORECASE\fR'
- .B OFMT
- the output format for numbers,
- .B %.6g
- by default.
- .TP \l'\fBIGNORECASE\fR'
- .B OFS
- the output field separator, a blank by default.
- .TP \l'\fBIGNORECASE\fR'
- .B ORS
- the output record separator, by default a newline.
- .TP \l'\fBIGNORECASE\fR'
- .B RS
- the input record separator, by default a newline.
- .B RS
- is exceptional in that only the first character of its string
- value is used for separating records. If
- .B RS
- is set to the null string, then records are separated by
- blank lines.
- When
- .B RS
- is set to the null string, then the newline character always acts as
- a field separator, in addition to whatever value
- .B FS
- may have.
- .TP \l'\fBIGNORECASE\fR'
- .B RSTART
- the index of the first character matched by
- .BR match() ;
- 0 if no match.
- .TP \l'\fBIGNORECASE\fR'
- .B RLENGTH
- the length of the string matched by
- .BR match() ;
- \-1 if no match.
- .TP \l'\fBIGNORECASE\fR'
- .B SUBSEP
- the character used to separate multiple subscripts in array
- elements, by default \fB"\e034"\fR.
- .RE
- .SS Arrays
- .PP
- Arrays are subscripted with an expression between square brackets
- .RB ( [ " and " ] ).
- If the expression is an expression list
- .RI ( expr ", " expr " ...)"
- then the array subscript is a string consisting of the
- concatenation of the (string) value of each expression,
- separated by the value of the
- .B SUBSEP
- variable.
- This facility is used to simulate multiply dimensioned
- arrays. For example:
- .PP
- .RS
- .ft B
- i = "A" ;\^ j = "B" ;\^ k = "C"
- .br
- x[i, j, k] = "hello, world\en"
- .ft R
- .RE
- .PP
- assigns the string \fB"hello, world\en"\fR to the element of the array
- .B x
- which is indexed by the string \fB"A\e034B\e034C"\fR. All arrays in AWK
- are associative, i.e. indexed by string values.
- .PP
- The special operator
- .B in
- may be used in an
- .B if
- or
- .B while
- statement to see if an array has an index consisting of a particular
- value.
- .PP
- .RS
- .ft B
- .nf
- if (val in array)
- print array[val]
- .fi
- .ft
- .RE
- .PP
- If the array has multiple subscripts, use
- .BR "(i, j) in array" .
- .PP
- The
- .B in
- construct may also be used in a
- .B for
- loop to iterate over all the elements of an array.
- .PP
- An element may be deleted from an array using the
- .B delete
- statement.
- .SS Variable Typing
- .PP
- Variables and fields
- may be (floating point) numbers, or strings, or both. How the
- value of a variable is interpreted depends upon its context. If used in
- a numeric expression, it will be treated as a number, if used as a string
- it will be treated as a string.
- .PP
- To force a variable to be treated as a number, add 0 to it; to force it
- to be treated as a string, concatenate it with the null string.
- .PP
- The AWK language defines comparisons as being done numerically if
- possible, otherwise one or both operands are converted to strings and
- a string comparison is performed.
- .PP
- Uninitialized variables have the numeric value 0 and the string value ""
- (the null, or empty, string).
- .SH PATTERNS AND ACTIONS
- AWK is a line oriented language. The pattern comes first, and then the
- action. Action statements are enclosed in
- .B {
- and
- .BR } .
- Either the pattern may be missing, or the action may be missing, but,
- of course, not both. If the pattern is missing, the action will be
- executed for every single line of input.
- A missing action is equivalent to
- .RS
- .PP
- .B "{ print }"
- .RE
- .PP
- which prints the entire line.
- .PP
- Comments begin with the ``#'' character, and continue until the
- end of the line.
- Blank lines may be used to separate statements.
- Normally, a statement ends with a newline, however, this is not the
- case for lines ending in
- a ``,'', ``{'', ``?'', ``:'', ``&&'', or ``||''.
- Lines ending in
- .B do
- or
- .B else
- also have their statements automatically continued on the following line.
- In other cases, a line can be continued by ending it with a ``\e'',
- in which case the newline will be ignored.
- .PP
- Multiple statements may
- be put on one line by separating them with a ``;''.
- This applies to both the statements within the action part of a
- pattern-action pair (the usual case),
- and to the pattern-action statements themselves.
- .SS Patterns
- AWK patterns may be one of the following:
- .PP
- .RS
- .nf
- .B BEGIN
- .B END
- .BI / "regular expression" /
- .I "relational expression"
- .IB pattern " && " pattern
- .IB pattern " || " pattern
- .IB pattern " ? " pattern " : " pattern
- .BI ( pattern )
- .BI ! " pattern"
- .IB pattern1 ", " pattern2"
- .fi
- .RE
- .PP
- .B BEGIN
- and
- .B END
- are two special kinds of patterns which are not tested against
- the input.
- The action parts of all
- .B BEGIN
- patterns are merged as if all the statements had
- been written in a single
- .B BEGIN
- block. They are executed before any
- of the input is read. Similarly, all the
- .B END
- blocks are merged,
- and executed when all the input is exhausted (or when an
- .B exit
- statement is executed).
- .B BEGIN
- and
- .B END
- patterns cannot be combined with other patterns in pattern expressions.
- .B BEGIN
- and
- .B END
- patterns cannot have missing action parts.
- .PP
- For
- .BI / "regular expression" /
- patterns, the associated statement is executed for each input line that matches
- the regular expression.
- Regular expressions are the same as those in
- .IR egrep (1),
- and are summarized below.
- .PP
- A
- .I "relational expression"
- may use any of the operators defined below in the section on actions.
- These generally test whether certain fields match certain regular expressions.
- .PP
- The
- .BR && ,
- .BR || ,
- and
- .B !
- operators are logical AND, logical OR, and logical NOT, respectively, as in C.
- They do short-circuit evaluation, also as in C, and are used for combining
- more primitive pattern expressions. As in most languages, parentheses
- may be used to change the order of evaluation.
- .PP
- The
- .B ?\^:
- operator is like the same operator in C. If the first pattern is true
- then the pattern used for testing is the second pattern, otherwise it is
- the third. Only one of the second and third patterns is evaluated.
- .PP
- The
- .IB pattern1 ", " pattern2"
- form of an expression is called a range pattern.
- It matches all input lines starting with a line that matches
- .IR pattern1 ,
- and continuing until a line that matches
- .IR pattern2 ,
- inclusive. It does not combine with any other sort of pattern expression.
- .SS Regular Expressions
- Regular expressions are the extended kind found in
- .IR egrep .
- They are composed of characters as follows:
- .RS
- .TP \l'[^abc...]'
- .I c
- matches the non-metacharacter
- .IR c .
- .TP \l'[^abc...]'
- .I \ec
- matches the literal character
- .IR c .
- .TP \l'[^abc...]'
- .B .
- matches any character except newline.
- .TP \l'[^abc...]'
- .B ^
- matches the beginning of a line or a string.
- .TP \l'[^abc...]'
- .B $
- matches the end of a line or a string.
- .TP \l'[^abc...]'
- .BI [ abc... ]
- character class, matches any of the characters
- .IR abc... .
- .TP \l'[^abc...]'
- .BI [^ abc... ]
- negated character class, matches any character except
- .I abc...
- and newline.
- .TP \l'[^abc...]'
- .IB r1 | r2
- alternation: matches either
- .I r1
- or
- .IR r2 .
- .TP \l'[^abc...]'
- .I r1r2
- concatenation: matches
- .IR r1 ,
- and then
- .IR r2 .
- .TP \l'[^abc...]'
- .IB r +
- matches one or more
- .IR r 's.
- .TP \l'[^abc...]'
- .IB r *
- matches zero or more
- .IR r 's.
- .TP \l'[^abc...]'
- .IB r ?
- matches zero or one
- .IR r 's.
- .TP \l'[^abc...]'
- .BI ( r )
- grouping: matches
- .IR r .
- .RE
- The escape sequences that are valid in string constants (see below)
- are also legal in regular expressions.
- .SS Actions
- Action statements are enclosed in braces,
- .B {
- and
- .BR } .
- Action statements consist of the usual assignment, conditional, and looping
- statements found in most languages. The operators, control statements,
- and input/output statements
- available are patterned after those in C.
- .SS Operators
- .PP
- The operators in AWK, in order of increasing precedence, are
- .PP
- .RS
- .TP \l'\fB= += \-= *= /= %= ^=\fR'
- .B "= += \-= *= /= %= ^="
- Assignment. Both absolute assignment
- .BI ( var " = " value )
- and operator-assignment (the other forms) are supported.
- .TP \l'\fB= += \-= *= /= %= ^=\fR'
- .B ?:
- The C conditional expression. This has the form
- .IB expr1 " ? " expr2 " : " expr3\c
- \&. If
- .I expr1
- is true, the value of the expression is
- .IR expr2 ,
- otherwise it is
- .IR expr3 .
- Only one of
- .I expr2
- and
- .I expr3
- is evaluated.
- .TP \l'\fB= += \-= *= /= %= ^=\fR'
- .B ||
- logical OR.
- .TP \l'\fB= += \-= *= /= %= ^=\fR'
- .B &&
- logical AND.
- .TP \l'\fB= += \-= *= /= %= ^=\fR'
- .B "~ !~"
- regular expression match, negated match.
- .TP \l'\fB= += \-= *= /= %= ^=\fR'
- .B "< <= > >= != =="
- the regular relational operators.
- .TP \l'\fB= += \-= *= /= %= ^=\fR'
- .I blank
- string concatenation.
- .TP \l'\fB= += \-= *= /= %= ^=\fR'
- .B "+ \-"
- addition and subtraction.
- .TP \l'\fB= += \-= *= /= %= ^=\fR'
- .B "* / %"
- multiplication, division, and modulus.
- .TP \l'\fB= += \-= *= /= %= ^=\fR'
- .B "+ \- !"
- unary plus, unary minus, and logical negation.
- .TP \l'\fB= += \-= *= /= %= ^=\fR'
- .B ^
- exponentiation (\fB**\fR may also be used, and \fB**=\fR for
- the assignment operator).
- .TP \l'\fB= += \-= *= /= %= ^=\fR'
- .B "++ \-\^\-"
- increment and decrement, both prefix and postfix.
- .TP \l'\fB= += \-= *= /= %= ^=\fR'
- .B $
- field reference.
- .RE
- .SS Control Statements
- .PP
- The control statements are
- as follows:
- .PP
- .RS
- .nf
- \fBif (\fIcondition\fB) \fIstatement\fR [ \fBelse\fI statement \fR]
- \fBwhile (\fIcondition\fB) \fIstatement \fR
- \fBdo \fIstatement \fBwhile (\fIcondition\fB)\fR
- \fBfor (\fIexpr1\fB; \fIexpr2\fB; \fIexpr3\fB) \fIstatement\fR
- \fBfor (\fIvar \fBin\fI array\fB) \fIstatement\fR
- \fBbreak\fR
- \fBcontinue\fR
- \fBdelete \fIarray\^\fB[\^\fIindex\^\fB]\fR
- \fBexit\fR [ \fIexpression\fR ]
- \fB{ \fIstatements \fB}
- .fi
- .RE
- .SS "I/O Statements"
- .PP
- The input/output statements are as follows:
- .PP
- .RS
- .TP \l'\fBprintf \fIfmt, expr-list\fR'
- .BI close( filename )
- close file (or pipe, see below).
- .TP \l'\fBprintf \fIfmt, expr-list\fR'
- .B getline
- set
- .B $0
- from next input record; set
- .BR NF ,
- .BR NR ,
- .BR FNR .
- .TP \l'\fBprintf \fIfmt, expr-list\fR'
- .BI "getline <" file
- set
- .B $0
- from next record of
- .IR file ;
- set
- .BR NF .
- .TP \l'\fBprintf \fIfmt, expr-list\fR'
- .BI getline " var"
- set
- .I var
- from next input record; set
- .BR NF ,
- .BR FNR .
- .TP \l'\fBprintf \fIfmt, expr-list\fR'
- .BI getline " var" " <" file
- set
- .I var
- from next record of
- .IR file .
- .TP \l'\fBprintf \fIfmt, expr-list\fR'
- .B next
- Stop processing the current input record. The next input record
- is read and processing starts over with the first pattern in the
- AWK program. If the end of the input data is reached, the
- .B END
- block(s), if any, are executed.
- .TP \l'\fBprintf \fIfmt, expr-list\fR'
- .B print
- prints the current record.
- .TP \l'\fBprintf \fIfmt, expr-list\fR'
- .BI print " expr-list"
- prints expressions.
- .TP \l'\fBprintf \fIfmt, expr-list\fR'
- .BI print " expr-list" " >" file
- prints expressions on
- .IR file .
- .TP \l'\fBprintf \fIfmt, expr-list\fR'
- .BI printf " fmt, expr-list"
- format and print.
- .TP \l'\fBprintf \fIfmt, expr-list\fR'
- .BI printf " fmt, expr-list" " >" file
- format and print on
- .IR file .
- .TP \l'\fBprintf \fIfmt, expr-list\fR'
- .BI system( cmd-line )
- execute the command
- .IR cmd-line ,
- and return the exit status.
- (This may not be available on
- systems besides \s-1UNIX\s+1 and \s-1GNU\s+1.)
- .RE
- .PP
- Other input/output redirections are also allowed. For
- .B print
- and
- .BR printf ,
- .BI >> file
- appends output to the
- .IR file ,
- while
- .BI | " command"
- writes on a pipe.
- In a similar fashion,
- .IB command " | getline"
- pipes into
- .BR getline .
- .BR Getline
- will return 0 on end of file, and \-1 on an error.
- .SS The \fIprintf\fP Statement
- .PP
- The AWK versions of the
- .B printf
- and
- .B sprintf
- (see below)
- functions accept the following conversion specification formats:
- .RS
- .TP
- .B %c
- An ASCII character.
- If the argument used for
- .B %c
- is numeric, it is treated as a character and printed.
- Otherwise, the argument is assumed to be a string, and the only first
- character of that string is printed.
- .TP
- .B %d
- A decimal number (the integer part).
- .TP
- .B %i
- Just like
- .BR %d .
- .TP
- .B %e
- A floating point number of the form
- .BR [\-]d.ddddddE[+\^\-]dd .
- .TP
- .B %f
- A floating point number of the form
- .BR [\-]ddd.dddddd .
- .TP
- .B %g
- Use
- .B e
- or
- .B f
- conversion, whichever is shorter, with nonsignificant zeros suppressed.
- .TP
- .B %o
- An unsigned octal number (again, an integer).
- .TP
- .B %s
- A character string.
- .TP
- .B %x
- An unsigned hexadecimal number (an integer).
- .TP
- .B %X
- Like
- .BR %x ,
- but using
- .B ABCDEF
- instead of
- .BR abcdef .
- .TP
- .B %%
- A single
- .B %
- character; no argument is converted.
- .RE
- .PP
- There are optional, additional parameters that may lie between the
- .B %
- and the control letter:
- .RS
- .TP
- .B \-
- The expression should be left-justified within its field.
- .TP
- .I width
- The field should be padded to this width. If the number has a leading
- zero, then the field will be padded with zeros.
- Otherwise it is padded with blanks.
- .TP
- .BI . prec
- A number indicating the maximum width of strings or digits to the right
- of the decimal point.
- .RE
- .PP
- The dynamic
- .I width
- and
- .I prec
- capabilities of the C library
- .B printf
- routines are not supported.
- However, they may be simulated by using
- the AWK concatenation operation to build up
- a format specification dynamically.
- .SS Special File Names
- .PP
- When doing I/O redirection from either
- .B print
- or
- .B printf
- into a file,
- or via
- .B getline
- from a file,
- .I gawk
- recognizes certain special filenames internally. These filenames
- allow access to open file descriptors inherited from
- .IR gawk 's
- parent process (usually the shell). The filenames are:
- .RS
- .TP
- .B /dev/stdin
- The standard input.
- .TP
- .B /dev/stdout
- The standard output.
- .TP
- .B /dev/stderr
- The standard error output.
- .TP
- .BI /dev/fd/\^ n
- The file denoted by the open file descriptor
- .IR n .
- .RE
- .PP
- These are particularly useful for error messages. For example:
- .PP
- .RS
- .ft B
- print "You blew it!" > "/dev/stderr"
- .ft R
- .RE
- .PP
- whereas you would otherwise have to use
- .PP
- .RS
- .ft B
- print "You blew it!" | "cat 1>&2"
- .ft R
- .RE
- .PP
- These file names may also be used on the command line to name data files.
- .SS Numeric Functions
- .PP
- AWK has the following pre-defined arithmetic functions:
- .PP
- .RS
- .TP \l'\fBsrand(\fIexpr\fB)\fR'
- .BI atan2( y , " x" )
- returns the arctangent of
- .I y/x
- in radians.
- .TP \l'\fBsrand(\fIexpr\fB)\fR'
- .BI cos( expr )
- returns the cosine in radians.
- .TP \l'\fBsrand(\fIexpr\fB)\fR'
- .BI exp( expr )
- the exponential function.
- .TP \l'\fBsrand(\fIexpr\fB)\fR'
- .BI int( expr )
- truncates to integer.
- .TP \l'\fBsrand(\fIexpr\fB)\fR'
- .BI log( expr )
- the natural logarithm function.
- .TP \l'\fBsrand(\fIexpr\fB)\fR'
- .B rand()
- returns a random number between 0 and 1.
- .TP \l'\fBsrand(\fIexpr\fB)\fR'
- .BI sin( expr )
- returns the sine in radians.
- .TP \l'\fBsrand(\fIexpr\fB)\fR'
- .BI sqrt( expr )
- the square root function.
- .TP \l'\fBsrand(\fIexpr\fB)\fR'
- .BI srand( expr )
- use
- .I expr
- as a new seed for the random number generator. If no
- .I expr
- is provided, the time of day will be used.
- The return value is the previous seed for the random
- number generator.
- .RE
- .SS String Functions
- .PP
- AWK has the following pre-defined string functions:
- .PP
- .RS
- .TP \l'\fBsprintf(\fIfmt\fB, \fIexpr-list\fB)\fR'
- \fBgsub(\fIr\fB, \fIs\fB, \fIt\fB)\fR
- for each substring matching the regular expression
- .I r
- in the string
- .IR t ,
- substitute the string
- .IR s ,
- and return the number of substitutions.
- If
- .I t
- is not supplied, use
- .BR $0 .
- .TP \l'\fBsprintf(\fIfmt\fB, \fIexpr-list\fB)\fR'
- .BI index( s , " t" )
- returns the index of the string
- .I t
- in the string
- .IR s ,
- or 0 if
- .I t
- is not present.
- .TP \l'\fBsprintf(\fIfmt\fB, \fIexpr-list\fB)\fR'
- .BI length( s )
- returns the length of the string
- .IR s .
- .TP \l'\fBsprintf(\fIfmt\fB, \fIexpr-list\fB)\fR'
- .BI match( s , " r" )
- returns the position in
- .I s
- where the regular expression
- .I r
- occurs, or 0 if
- .I r
- is not present, and sets the values of
- .B RSTART
- and
- .BR RLENGTH .
- .TP \l'\fBsprintf(\fIfmt\fB, \fIexpr-list\fB)\fR'
- \fBsplit(\fIs\fB, \fIa\fB, \fIr\fB)\fR
- splits the string
- .I s
- into the array
- .I a
- on the regular expression
- .IR r ,
- and returns the number of fields. If
- .I r
- is omitted,
- .B FS
- is used instead.
- .TP \l'\fBsprintf(\fIfmt\fB, \fIexpr-list\fB)\fR'
- .BI sprintf( fmt , " expr-list" )
- prints
- .I expr-list
- according to
- .IR fmt ,
- and returns the resulting string.
- .TP \l'\fBsprintf(\fIfmt\fB, \fIexpr-list\fB)\fR'
- \fBsub(\fIr\fB, \fIs\fB, \fIt\fB)\fR
- this is just like
- .BR gsub ,
- but only the first matching substring is replaced.
- .TP \l'\fBsprintf(\fIfmt\fB, \fIexpr-list\fB)\fR'
- \fBsubstr(\fIs\fB, \fIi\fB, \fIn\fB)\fR
- returns the
- .IR n -character
- substring of
- .I s
- starting at
- .IR i .
- If
- .I n
- is omitted, the rest of
- .I s
- is used.
- .TP \l'\fBsprintf(\fIfmt\fB, \fIexpr-list\fB)\fR'
- .BI tolower( str )
- returns a copy of the string
- .IR str ,
- with all the upper-case characters in
- .I str
- translated to their corresponding lower-case counterparts.
- Non-alphabetic characters are left unchanged.
- .TP \l'\fBsprintf(\fIfmt\fB, \fIexpr-list\fB)\fR'
- .BI toupper( str )
- returns a copy of the string
- .IR str ,
- with all the lower-case characters in
- .I str
- translated to their corresponding upper-case counterparts.
- Non-alphabetic characters are left unchanged.
- .RE
- .SS String Constants
- .PP
- String constants in AWK are sequences of characters enclosed
- between double quotes (\fB"\fR). Within strings, certain
- .I "escape sequences"
- are recognized, as in C. These are:
- .PP
- .RS
- .TP \l'\fB\e\fIddd\fR'
- .B \e\e
- A literal backslash.
- .TP \l'\fB\e\fIddd\fR'
- .B \ea
- The ``alert'' character; usually the ASCII BEL character.
- .TP \l'\fB\e\fIddd\fR'
- .B \eb
- backspace.
- .TP \l'\fB\e\fIddd\fR'
- .B \ef
- form-feed.
- .TP \l'\fB\e\fIddd\fR'
- .B \en
- new line.
- .TP \l'\fB\e\fIddd\fR'
- .B \er
- carriage return.
- .TP \l'\fB\e\fIddd\fR'
- .B \et
- horizontal tab.
- .TP \l'\fB\e\fIddd\fR'
- .B \ev
- vertical tab.
- .TP \l'\fB\e\fIddd\fR'
- .BI \ex "\^hex digits"
- The character represented by the string of hexadecimal digits following
- the
- .BR \ex .
- As in ANSI C, all following hexadecimal digits are considered part of
- the escape sequence.
- (This feature should tell us something about language design by committee.)
- E.g., "\ex1B" is the ASCII ESC (escape) character.
- .TP \l'\fB\e\fIddd\fR'
- .BI \e ddd
- The character represented by the 1-, 2-, or 3-digit sequence of octal
- digits. E.g. "\e033" is the ASCII ESC (escape) character.
- .TP \l'\fB\e\fIddd\fR'
- .BI \e c
- The literal character
- .IR c\^ .
- .RE
- .PP
- The escape sequences may also be used inside constant regular expressions
- (e.g.,
- .B "/[\ \et\ef\en\er\ev]/"
- matches whitespace characters).
- .SH FUNCTIONS
- Functions in AWK are defined as follows:
- .PP
- .RS
- \fBfunction \fIname\fB(\fIparameter list\fB) { \fIstatements \fB}\fR
- .RE
- .PP
- Functions are executed when called from within the action parts of regular
- pattern-action statements. Actual parameters supplied in the function
- call are used to instantiate the formal parameters declared in the function.
- Arrays are passed by reference, other variables are passed by value.
- .PP
- Since functions were not originally part of the AWK language, the provision
- for local variables is rather clumsy: they are declared as extra parameters
- in the parameter list. The convention is to separate local variables from
- real parameters by extra spaces in the parameter list. For example:
- .PP
- .RS
- .ft B
- .nf
- function f(p, q, a, b) { # a & b are local
- ..... }
-
- /abc/ { ... ; f(1, 2) ; ... }
- .fi
- .ft R
- .RE
- .PP
- The left parenthesis in a function call is required
- to immediately follow the function name,
- without any intervening white space.
- This is to avoid a syntactic ambiguity with the concatenation operator.
- This restriction does not apply to the built-in functions listed above.
- .PP
- Functions may call each other and may be recursive.
- Function parameters used as local variables are initialized
- to the null string and the number zero upon function invocation.
- .PP
- The word
- .B func
- may be used in place of
- .BR function .
- .SH EXAMPLES
- .nf
- Print and sort the login names of all users:
-
- .ft B
- BEGIN { FS = ":" }
- { print $1 | "sort" }
-
- .ft R
- Count lines in a file:
-
- .ft B
- { nlines++ }
- END { print nlines }
-
- .ft R
- Precede each line by its number in the file:
-
- .ft B
- { print FNR, $0 }
-
- .ft R
- Concatenate and line number (a variation on a theme):
-
- .ft B
- { print NR, $0 }
- .ft R
- .fi
- .SH SEE ALSO
- .IR egrep (1)
- .PP
- .IR "The AWK Programming Language" ,
- Alfred V. Aho, Brian W. Kernighan, Peter J. Weinberger,
- Addison-Wesley, 1988. ISBN 0-201-07981-X.
- .PP
- .IR "The GAWK Manual" ,
- published by the Free Software Foundation, 1989.
- .SH SYSTEM V RELEASE 4 COMPATIBILITY
- A primary goal for
- .I gawk
- is compatibility with the latest version of \s-1UNIX\s+1
- .IR awk .
- To this end,
- .I gawk
- incorporates the following user visible
- features which are not described in the AWK book,
- but are part of
- .I awk
- in System V Release 4.
- .PP
- The
- .B \-v
- option for assigning variables before program execution starts is new.
- The book indicates that command line variable assignment happens when
- .I awk
- would otherwise open the argument as a file, which is after the
- .B BEGIN
- block is executed. However, in earlier implementations, when such an
- assignment appeared before any file names, the assignment would happen
- .I before
- the
- .B BEGIN
- block was run. Applications came to depend on this ``feature.''
- When
- .I awk
- was changed to match its documentation, this option was added to
- accomodate applications that depended upon the old behaviour.
- .PP
- When processing arguments,
- .I gawk
- uses the special option ``\fB\-\^\-\fP'' to signal the end of
- arguments, and warns about, but otherwise ignores, undefined options.
- .PP
- The AWK book does not define the return value of
- .BR srand() .
- The System V Release 4 version of \s-1UNIX\s+1
- .I awk
- has it return the seed it was using, to allow keeping track
- of random number sequences. Therefore
- .B srand()
- in
- .I gawk
- also returns its current seed.
- .PP
- Other new features are:
- The use of multiple
- .B \-f
- options; the
- .B ENVIRON
- array; the
- .BR \ea ,
- and
- .BR \ev ,
- .B \ex
- escape sequences; the
- .B tolower
- and
- .B toupper
- built-in functions; and the ANSI C conversion specifications in
- .BR printf .
- .SH GNU EXTENSIONS
- .I Gawk
- has some extensions to System V
- .IR awk .
- They are described in this section. All the extensions described here
- can be disabled by compiling
- .I gawk
- with
- .BR \-DSTRICT ,
- or by invoking
- .I gawk
- with the
- .B \-c
- option.
- If the underlying operating system supports the
- .B /dev/fd
- directory and corresponding files, then
- .I gawk
- can be compiled with
- .B \-DNO_DEV_FD
- to disable the special filename processing.
- .PP
- The following features of
- .I gawk
- are not available in
- System V
- .IR awk .
- .RS
- .TP \l'\(bu'
- \(bu
- The special file names available for I/O redirection are not recognized.
- .TP \l'\(bu'
- \(bu
- The
- .B IGNORECASE
- variable and its side-effects are not available.
- .TP \l'\(bu'
- \(bu
- No path search is performed for files named via the
- .B \-f
- option. Therefore the
- .B AWKPATH
- environment variable is not special.
- .TP \l'\(bu'
- \(bu
- The
- .BR \-a ,
- .BR \-e ,
- .BR \-c ,
- .BR \-C ,
- and
- .B \-V
- command line options.
- .RE
- .PP
- The AWK book does not define the return value of the
- .B close
- function.
- .IR Gawk\^ 's
- .B close
- returns the value from
- .IR fclose (3),
- or
- .IR pclose (3),
- when closing a file or pipe, respectively.
- .PP
- When
- .I gawk
- is invoked with the
- .B \-c
- option,
- if the
- .I fs
- argument to the
- .B \-F
- option is ``t'', then
- .B FS
- will be set to the tab character.
- Since this is a rather ugly special case, it is not the default behavior.
- .ig
- .PP
- The rest of the features described in this section may change at some time in
- the future, or may go away entirely.
- You should not write programs that depend upon them.
- .PP
- .I Gawk
- accepts the following additional options:
- .TP
- .B \-D
- Turn on general debugging and turn on
- .IR yacc (1)
- or
- .IR bison (1)
- debugging output during program parsing.
- This option should only be of interest to the
- .I gawk
- maintainers, and may not even be compiled into
- .IR gawk .
- .TP
- .B \-d
- Turn on general debugging and print the
- .I gawk
- internal tree as the program is executed.
- This option should only be of interest to the
- .I gawk
- maintainers, and may not even be compiled into
- .IR gawk .
- ..
- .SH BUGS
- The
- .B \-F
- option is not necessary given the command line variable assignment feature;
- it remains only for backwards compatibility.
- .PP
- There are now too many options.
- Fortunately, most of them are rarely needed.
- .SH AUTHORS
- The original version of \s-1UNIX\s+1
- .I awk
- was designed and implemented by Alfred Aho,
- Peter Weinberger, and Brian Kernighan of AT&T Bell Labs. Brian Kernighan
- continues to maintain and enhance it.
- .PP
- Paul Rubin and Jay Fenlason,
- of the Free Software Foundation, wrote
- .IR gawk ,
- to be compatible with the original version of
- .I awk
- distributed in Seventh Edition \s-1UNIX\s+1.
- John Woods contributed a number of bug fixes.
- David Trueman of Dalhousie University, with contributions
- from Arnold Robbins at Emory University, made
- .I gawk
- compatible with the new version of \s-1UNIX\s+1
- .IR awk .
- .SH ACKNOWLEDGEMENTS
- Brian Kernighan of Bell Labs
- provided valuable assistance during testing and debugging.
- We thank him.
-